51 research outputs found

    Learning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectories

    Get PDF
    Hydrogen bonds (H-bonds) play a key role in both the formation and stabilization of protein structures. H-bonds involving atoms from residues that are close to each other in the main-chain sequence stabilize secondary structure elements. H-bonds between atoms from distant residues stabilize a protein’s tertiary structure. However, H-bonds greatly vary in stability. They form and break while a protein deforms. For instance, the transition of a protein from a nonfunctional to a functional state may require some H-bonds to break and others to form. The intrinsic strength of an individual H-bond has been studied from an energetic viewpoint, but energy alone may not be a very good predictor. Other local interactions may reinforce (or weaken) an H-bond. This paper describes inductive learning methods to train a protein-independent probabilistic model of H-bond stability from molecular dynamics (MD) simulation trajectories. The training data describes H-bond occurrences at successive times along these trajectories by the values of attributes called predictors. A trained model is constructed in the form of a regression tree in which each non-leaf node is a Boolean test (split) on a predictor. Each occurrence of an H-bond maps to a path in this tree from the root to a leaf node. Its predicted stability is associated with the leaf node. Experimental results demonstrate that such models can predict H-bond stability quite well. In particular, their performance is roughly 20 % better than that of models based on H-bond energy alone. In addition, they can accurately identify a large fraction of the least stable H-bonds in a give

    A Search for Energy Minimized Sequences of Proteins

    Get PDF
    In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function

    Stabilisation of the Fc Fragment of Human IgG1 by Engineered Intradomain Disulfide Bonds

    Get PDF
    We report the stabilization of the human IgG1 Fc fragment by engineered intradomain disulfide bonds. One of these bonds, which connects the N-terminus of the CH3 domain with the F-strand, led to an increase of the melting temperature of this domain by 10°C as compared to the CH3 domain in the context of the wild-type Fc region. Another engineered disulfide bond, which connects the BC loop of the CH3 domain with the D-strand, resulted in an increase of Tm of 5°C. Combined in one molecule, both intradomain disulfide bonds led to an increase of the Tm of about 15°C. All of these mutations had no impact on the thermal stability of the CH2 domain. Importantly, the binding of neonatal Fc receptor was also not influenced by the mutations. Overall, the stabilized CH3 domains described in this report provide an excellent basic scaffold for the engineering of Fc fragments for antigen-binding or other desired additional or improved properties. Additionally, we have introduced the intradomain disulfide bonds into an IgG Fc fragment engineered in C-terminal loops of the CH3 domain for binding to Her2/neu, and observed an increase of the Tm of the CH3 domain for 7.5°C for CysP4, 15.5°C for CysP2 and 19°C for the CysP2 and CysP4 disulfide bonds combined in one molecule

    Computational Design of a PDZ Domain Peptide Inhibitor that Rescues CFTR Activity

    Get PDF
    The cystic fibrosis transmembrane conductance regulator (CFTR) is an epithelial chloride channel mutated in patients with cystic fibrosis (CF). The most prevalent CFTR mutation, ΔF508, blocks folding in the endoplasmic reticulum. Recent work has shown that some ΔF508-CFTR channel activity can be recovered by pharmaceutical modulators (“potentiators” and “correctors”), but ΔF508-CFTR can still be rapidly degraded via a lysosomal pathway involving the CFTR-associated ligand (CAL), which binds CFTR via a PDZ interaction domain. We present a study that goes from theory, to new structure-based computational design algorithms, to computational predictions, to biochemical testing and ultimately to epithelial-cell validation of novel, effective CAL PDZ inhibitors (called “stabilizers”) that rescue ΔF508-CFTR activity. To design the “stabilizers”, we extended our structural ensemble-based computational protein redesign algorithm to encompass protein-protein and protein-peptide interactions. The computational predictions achieved high accuracy: all of the top-predicted peptide inhibitors bound well to CAL. Furthermore, when compared to state-of-the-art CAL inhibitors, our design methodology achieved higher affinity and increased binding efficiency. The designed inhibitor with the highest affinity for CAL (kCAL01) binds six-fold more tightly than the previous best hexamer (iCAL35), and 170-fold more tightly than the CFTR C-terminus. We show that kCAL01 has physiological activity and can rescue chloride efflux in CF patient-derived airway epithelial cells. Since stabilizers address a different cellular CF defect from potentiators and correctors, our inhibitors provide an additional therapeutic pathway that can be used in conjunction with current methods

    Protein Design Using Continuous Rotamers

    Get PDF
    Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. Availability: Software is available under the Lesser GNU Public License v3. Contact the authors for source code

    Hydrogen bond networks determine emergent mechanical and thermodynamic properties across a protein family

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gram-negative bacteria use periplasmic-binding proteins (bPBP) to transport nutrients through the periplasm. Despite immense diversity within the recognized substrates, all members of the family share a common fold that includes two domains that are separated by a conserved hinge. The hinge allows the protein to cycle between open (apo) and closed (ligated) conformations. Conformational changes within the proteins depend on a complex interplay of mechanical and thermodynamic response, which is manifested as an increase in thermal stability and decrease of flexibility upon ligand binding.</p> <p>Results</p> <p>We use a distance constraint model (DCM) to quantify the give and take between thermodynamic stability and mechanical flexibility across the bPBP family. Quantitative stability/flexibility relationships (QSFR) are readily evaluated because the DCM links mechanical and thermodynamic properties. We have previously demonstrated that QSFR is moderately conserved across a mesophilic/thermophilic RNase H pair, whereas the observed variance indicated that different enthalpy-entropy mechanisms allow similar mechanical response at their respective melting temperatures. Our predictions of heat capacity and free energy show marked diversity across the bPBP family. While backbone flexibility metrics are mostly conserved, cooperativity correlation (long-range couplings) also demonstrate considerable amount of variation. Upon ligand removal, heat capacity, melting point, and mechanical rigidity are, as expected, lowered. Nevertheless, significant differences are found in molecular cooperativity correlations that can be explained by the detailed nature of the hydrogen bond network.</p> <p>Conclusion</p> <p>Non-trivial mechanical and thermodynamic variation across the family is explained by differences within the underlying H-bond networks. The mechanism is simple; variation within the H-bond networks result in altered mechanical linkage properties that directly affect intrinsic flexibility. Moreover, varying numbers of H-bonds and their strengths control the likelihood for energetic fluctuations as H-bonds break and reform, thus directly affecting thermodynamic properties. Consequently, these results demonstrate how unexpected large differences, especially within cooperativity correlation, emerge from subtle differences within the underlying H-bond network. This inference is consistent with well-known results that show allosteric response within a family generally varies significantly. Identifying the hydrogen bond network as a critical determining factor for these large variances may lead to new methods that can predict such effects.</p

    A Measure of the Promiscuity of Proteins and Characteristics of Residues in the Vicinity of the Catalytic Site That Regulate Promiscuity

    Get PDF
    Promiscuity, the basis for the evolution of new functions through ‘tinkering’ of residues in the vicinity of the catalytic site, is yet to be quantitatively defined. We present a computational method Promiscuity Indices Estimator (PROMISE) - based on signatures derived from the spatial and electrostatic properties of the catalytic residues, to estimate the promiscuity (PromIndex) of proteins with known active site residues and 3D structure. PromIndex reflects the number of different active site signatures that have congruent matches in close proximity of its native catalytic site, the quality of the matches and difference in the enzymatic activity. Promiscuity in proteins is observed to follow a lognormal distribution (μ = 0.28, σ = 1.1 reduced chi-square = 3.0E-5). The PROMISE predicted promiscuous functions in any protein can serve as the starting point for directed evolution experiments. PROMISE ranks carboxypeptidase A and ribonuclease A amongst the more promiscuous proteins. We have also investigated the properties of the residues in the vicinity of the catalytic site that regulates its promiscuity. Linear regression establishes a weak correlation (R2∼0.1) between certain properties of the residues (charge, polar, etc) in the neighborhood of the catalytic residues and PromIndex. A stronger relationship states that most proteins with high promiscuity have high percentages of charged and polar residues within a radius of 3 Å of the catalytic site, which is validated using one-tailed hypothesis tests (P-values∼0.05). Since it is known that these characteristics are key factors in catalysis, their relationship with the promiscuity index cross validates the methodology of PROMISE

    Tradeoff Between Stability and Multispecificity in the Design of Promiscuous Proteins

    Get PDF
    Natural proteins often partake in several highly specific protein-protein interactions. They are thus subject to multiple opposing forces during evolutionary selection. To be functional, such multispecific proteins need to be stable in complex with each interaction partner, and, at the same time, to maintain affinity toward all partners. How is this multispecificity acquired through natural evolution? To answer this compelling question, we study a prototypical multispecific protein, calmodulin (CaM), which has evolved to interact with hundreds of target proteins. Starting from high-resolution structures of sixteen CaM-target complexes, we employ state-of-the-art computational methods to predict a hundred CaM sequences best suited for interaction with each individual CaM target. Then, we design CaM sequences most compatible with each possible combination of two, three, and all sixteen targets simultaneously, producing almost 70,000 low energy CaM sequences. By comparing these sequences and their energies, we gain insight into how nature has managed to find the compromise between the need for favorable interaction energies and the need for multispecificity. We observe that designing for more partners simultaneously yields CaM sequences that better match natural sequence profiles, thus emphasizing the importance of such strategies in nature. Furthermore, we show that the CaM binding interface can be nicely partitioned into positions that are critical for the affinity of all CaM-target complexes and those that are molded to provide interaction specificity. We reveal several basic categories of sequence-level tradeoffs that enable the compromise necessary for the promiscuity of this protein. We also thoroughly quantify the tradeoff between interaction energetics and multispecificity and find that facilitating seemingly competing interactions requires only a small deviation from optimal energies. We conclude that multispecific proteins have been subjected to a rigorous optimization process that has fine-tuned their sequences for interactions with a precise set of targets, thus conferring their multiple cellular functions

    A Generic Program for Multistate Protein Design

    Get PDF
    Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design

    Changes in Lysozyme Flexibility upon Mutation Are Frequent, Large and Long-Ranged

    Get PDF
    We investigate changes in human c-type lysozyme flexibility upon mutation via a Distance Constraint Model, which gives a statistical mechanical treatment of network rigidity. Specifically, two dynamical metrics are tracked. Changes in flexibility index quantify differences within backbone flexibility, whereas changes in the cooperativity correlation quantify differences within pairwise mechanical couplings. Regardless of metric, the same general conclusions are drawn. That is, small structural perturbations introduced by single point mutations have a frequent and pronounced affect on lysozyme flexibility that can extend over long distances. Specifically, an appreciable change occurs in backbone flexibility for 48% of the residues, and a change in cooperativity occurs in 42% of residue pairs. The average distance from mutation to a site with a change in flexibility is 17–20 Å. Interestingly, the frequency and scale of the changes within single point mutant structures are generally larger than those observed in the hen egg white lysozyme (HEWL) ortholog, which shares 61% sequence identity with human lysozyme. For example, point mutations often lead to substantial flexibility increases within the β-subdomain, which is consistent with experimental results indicating that it is the nucleation site for amyloid formation. However, β-subdomain flexibility within the human and HEWL orthologs is more similar despite the lowered sequence identity. These results suggest compensating mutations in HEWL reestablish desired properties
    corecore